With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.
translated by 谷歌翻译
Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.
translated by 谷歌翻译
无监督的生成的虚拟人类具有各种外观和动画姿势对于创建3D人体化身和其他AR/VR应用非常重要。现有方法要么仅限于刚性对象建模,要么不生成,因此无法合成高质量的虚拟人类并使它们进行动画化。在这项工作中,我们提出了Avatargen,这是第一种不仅可以具有不同外观的非刚性人类产生的方法,而且还可以完全控制姿势和观点,同时仅需要2D图像进行训练。具体而言,它通过利用粗糙的人体模型作为代理将观察空间扭曲到规范空间下的标准头像,将最近的3D甘斯扩展到了人类的衣服。为了建模非刚性动力学,它引入了一个变形网络,以学习规范空间中的姿势依赖性变形。为了提高生成的人类化身的几何质量,它利用签名距离字段作为几何表示,从而可以从几何学学习上的身体模型中进行更直接的正则化。从这些设计中受益,我们的方法可以生成具有高质量外观和几何形状建模的动画人体化身,从而极大地表现了先前的3D gan。此外,它有能力用于许多应用,例如单视重构造,复活和文本引导的合成。代码和预培训模型将可用。
translated by 谷歌翻译
深度学习方法已被证明可以有效地表示量子多体系统的地面波函数。现有方法由于其图像样结构而使用卷积神经网络(CNN)进行方格。对于非方格晶格,现有方法使用图形神经网络(GNN),其中未精确捕获结构信息,从而需要其他手工制作的Sublattice编码。在这项工作中,我们提出了晶格卷积,其中使用一组建议的操作将非方格晶格转换为类似网格的增强晶格,可以在上进行定期卷积。根据提议的晶格卷积,我们设计了使用自我门控和注意机制的晶格卷积网络(LCN)。实验结果表明,我们的方法在PAR上的性能或比Spin 1/2 $ J_1 $ - $ J_2 $ HEISENBERG模型在Square,Honeycomb,Triangular和Kagome Lattices上的现有方法更好,而无需使用手工制作的编码。
translated by 谷歌翻译
虽然现有的脸部防欺骗(FAS)方法在域内实验中实现高精度,但由于普遍性较差,它们的效果严重陷入跨域情景。最近,已经探索了多种技巧,例如域泛化和代表性解剖。然而,改进仍然有限有两个问题:1)很难将所有面向共享特征空间的所有面。如果来自未知域的面不映射到共享特征空间中的已知区域,则会意外地获得不准确的预测。 2)很难完全考虑用于解剖学的各种欺骗痕迹。在本文中,我们提出了一个特征生成和假设验证框架来缓解两个问题。最重要的是,在FAS任务中第一次引入生成真实面和已知攻击的假设的特征生成网络。随后,应用两个假设验证模块来判断输入面是否分别来自真实面积和实体面分布。此外,给出了我们框架和贝叶斯不确定性估计之间关系的一些分析,为未知域中的可靠防御提供了理论支持。实验结果表明,我们的框架实现了有希望的结果,优于最先进的公共数据集的最先进的方法。
translated by 谷歌翻译
我们向渲染和时间(4D)重建人类的渲染和时间(4D)重建的神经辐射场,通过稀疏的摄像机捕获或甚至来自单眼视频。我们的方法将思想与神经场景表示,新颖的综合合成和隐式统计几何人称的人类表示相结合,耦合使用新颖的损失功能。在先前使用符号距离功能表示的结构化隐式人体模型,而不是使用统一的占用率来学习具有统一占用的光域字段。这使我们能够从稀疏视图中稳健地融合信息,并概括超出在训练中观察到的姿势或视图。此外,我们应用几何限制以共同学习观察到的主题的结构 - 包括身体和衣服 - 并将辐射场正规化为几何合理的解决方案。在多个数据集上的广泛实验证明了我们方法的稳健性和准确性,其概括能力显着超出了一系列的姿势和视图,以及超出所观察到的形状的统计外推。
translated by 谷歌翻译
Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
translated by 谷歌翻译
In this paper, we introduce a novel optimization algorithm for machine learning model training called Normalized Stochastic Gradient Descent (NSGD) inspired by Normalized Least Mean Squares (NLMS) from adaptive filtering. When we train a high-complexity model on a large dataset, the learning rate is significantly important as a poor choice of optimizer parameters can lead to divergence. The algorithm updates the new set of network weights using the stochastic gradient but with $\ell_1$ and $\ell_2$-based normalizations on the learning rate parameter similar to the NLMS algorithm. Our main difference from the existing normalization methods is that we do not include the error term in the normalization process. We normalize the update term using the input vector to the neuron. Our experiments present that the model can be trained to a better accuracy level on different initial settings using our optimization algorithm. In this paper, we demonstrate the efficiency of our training algorithm using ResNet-20 and a toy neural network on different benchmark datasets with different initializations. The NSGD improves the accuracy of the ResNet-20 from 91.96\% to 92.20\% on the CIFAR-10 dataset.
translated by 谷歌翻译
Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
translated by 谷歌翻译
狗主人通常能够识别出揭示其狗的主观状态的行为线索,例如疼痛。但是自动识别疼痛状态非常具有挑战性。本文提出了一种基于视频的新型,两流深的神经网络方法,以解决此问题。我们提取和预处理身体关键点,并在视频中计算关键点和RGB表示的功能。我们提出了一种处理自我十分和缺少关键点的方法。我们还提出了一个由兽医专业人员收集的独特基于视频的狗行为数据集,并注释以进行疼痛,并通过建议的方法报告良好的分类结果。这项研究是基于机器学习的狗疼痛状态估计的第一批作品之一。
translated by 谷歌翻译